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This paper describes the structure determination of nsp3a, the N-terminal domain of the severe acute 
respiratory syndrome coronavirus (SARS-CoV) nonstructural protein 3. nsp3a exhibits a ubiquitin-like glob- 
ular fold of residues 1 to 112 and a flexibly extended glutamic acid-rich domain of residues 113 to 183. In 
addition to the four B-strands and two a-helices that are common to ubiquitin-like folds, the globular domain 
of nsp3a contains two short helices representing a feature that has not previously been observed in these 
proteins. Nuclear magnetic resonance chemical shift perturbations showed that these unique structural 
elements are involved in interactions with single-stranded RNA. Structural similarities with proteins involved 
in various cell-signaling pathways indicate possible roles of nsp3a in viral infection and persistence. 


Severe acute respiratory syndrome (SARS) is a viral infec- 
tious disease that has attracted worldwide attention since an 
outbreak in 2003 (26). It has been postulated that the SARS 
coronavirus (SARS-CoV) was introduced to the human pop- 
ulation from animal CoVs (26). CoVs comprise a large group 
of enveloped, positive-sense, single-stranded RNA viruses that 
have been classified in the Nidovirales order. There are three 
groups of CoVs, based on serological cross-reactivity and phy- 
logenetic relatedness. The SARS-CoV is distantly related to 
the group 2 viruses and has been classified in group 2b (38). 

The SARS-CoV represents one of the largest currently 
known RNA genomes. It is composed of at least 14 functional 
open reading frames that encode three classes of proteins, 1.e., 
structural proteins (the S, M, E, N, 3a, 7a, and 7b proteins), 
nonstructural proteins (nsp1 to nsp16), and the accessory pro- 
teins (3b, 6, 8, 9b, and 14) (38). With regard to the nonstruc- 
tural proteins, the translation of the SARS-CoV genome pro- 
duces two large replicase polyproteins (ppla and pplab), 
which are processed by two proteases to yield 16 mature non- 
structural proteins that mediate RNA replication and process- 
ing. Since the SARS outbreak in 2003, knowledge of the struc- 
ture, activity and function of some of these proteins has 
increased considerably (30, 32, 35, 41, 45); however, the bio- 
logical roles of many of the SARS-CoV proteins remain un- 
known. In this paper we describe the nuclear magnetic reso- 
nance (NMR) structure determination and a preliminary 
functional characterization of nsp3a, the N-terminal domain of 
the largest of the nonstructural proteins, nsp3. 
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SARS-CoV nsp3 is a 213-kDa polypeptide involved in RNA 
replication and has been proposed to consist of seven domains, 
nsp3a to nsp3g, which have been identified based on phyloge- 
netic conservation and predicted amino acid secondary struc- 
ture (38). The biological role of nsp3 is only partially under- 
stood, and so far structures have been determined of only the 
two domains nsp3b, which has been described as an ADP 
ribose-1"-phosphatase (34), and nsp3d, which is a papain-like 
protease (PLpro) involved in the proteolytic processing of 
ppla and pplab. nsp3d contains three domains, two of which 
are involved directly in proteolysis, while the third one has a 
ubiquitin-like fold (31). 

nsp3a exhibits less than 35% sequence identity with other 
known proteins, and the closest homologues are found in other 
CoVs. The alignment shown in Fig. 1 indicates that group 2a 
CoVs (e.g., murine hepatitis virus and porcine hemagglutinat- 
ing encephalomyelitis virus) exhibit higher similarity with 
nsp3a than proteins from groups 1 (e.g., human coronavirus 
229E) and 3 (e.g., avian infectious bronchitis virus). The 183- 
residue nsp3a domain consists of a C-terminal subdomain of 
residues 113 to 183 that is rich in acidic residues (38% E and 
12% D) and a 112-residue N-terminal subdomain with a more 
homogeneous content of amino acids (Fig. 1). This report 
presents a structural characterization of residues 1 to 183 of 
nsp3a [nsp3a(1-183)] and the structure determination of the 
subdomain nsp3a(1-112) in solution by NMR spectroscopy. 


MATERIALS AND METHODS 


Production of nsp3a. Full-length nsp3a (consisting of residues 1 to 183) and a 
construct devoid of residues 113 to 183, nsp3a(1-112), were cloned into the 
expression vector pMH1F (His, tag; pBAD derivative) and expressed in DL41 
Escherichia coli cells with induction at 14°C in 2X YT (yeast extract and tryp- 
tone) medium. Each of the two constructs was shown, by one-dimensional (1-D) 
'H NMR, to form a folded globular domain (data not shown). To facilitate 
expression of samples suitable for NMR structure determination, both constructs 
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FIG. 1. (a) Sequence alignment of human SARS-CoV nsp3a(1- 
112) and the homologous regions from bat SARS-CoV (accession 
no. AAZ67050), murine hepatitis virus (HV) (strain A59; accession 
no. NP_740609), porcine hemagglutinating encephalomyelitis virus 
(HEV) (strain VW572; accession no. YP_459949), human CoV (hCoV 
229E; accession no. NP_835345), and avian infectious bronchitis virus 
(IBV) (strain Cal99; accession no. AASO00078). The residue numbers 
at the top correspond to the sequence of the human SARS-CoV and 
do not account for the insertions shown in the drawing. In each se- 
quence the conserved residues relative to SARS-CoV nsp3a are in 
bold. The regular secondary structure elements of SARS-CoV nsp3a 
are indicated by boxes. (b) Sequence of the subdomain of residues 113 
to 183 of human SARS-CoV. 


were subcloned into pET-25b (Novagen). These plasmids were used to transform 
E. coli strain BL21-CodonPlus (DE3)-RIL (Stratagene). The expression of uni- 
formly !°C,'°N-labeled nsp3a(1-112) was carried out by growing freshly trans- 
formed cells in M9 minimal medium containing 1 g/liter ""NH,Cl and 4 g/liter 
p-['°C,]glucose as the sole nitrogen and carbon sources. Cell cultures were 
grown at 37°C with vigorous shaking to an optical density at 600 nm of 0.8 to 0.9. 
The temperature was then lowered to 18°C, and after induction with 1 mM 
isopropyl-B-b-thiogalactopyranoside, the cell cultures were grown for 18 h. The 
cells were harvested by centrifugation, resuspended in extraction buffer (50 mM 
sodium phosphate at pH 6.5, 150 mM NaCl, 0.1% Triton X-100, and Complete 
protease inhibitor tablets [Roche]), and lysed by sonication. The cell debris was 
removed by centrifugation (20,000 x g for 20 min). For the first purification step, 
the soluble protein was loaded onto an anion exchange column (HiTrap QO FF; 
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FIG. 2. NMR structure of nsp3a(1-112). (a) Stereo view of the 
polypeptide backbone of a bundle of 20 energy-minimized CYANA 
conformers superimposed for minimal RMSD value of the backbone 
atoms of residues 20 to 108. The N-terminal segment of residues 1 to 
19 is flexibly disordered (Fig. 5). (b) Stereo view of a ribbon represen- 
tation of the conformer with the smallest RMSD relative to the mean 
coordinates of the ensemble of panel a. In both panels, B-strands are 
cyan and helices are red. Selected residue positions are indicated in 
panel a, and the regular secondary structures are identified in panel b. 


Amersham) equilibrated with 50 mM sodium phosphate buffer at pH 6.5 con- 
taining 150 mM NaCl. The proteins were eluted with a 150 to 1,000 mM NaCl 
gradient. Fractions containing nsp3a(1-112) were pooled and concentrated to a 
volume of 10 ml using centrifugal ultrafiltration devices (Millipore). Subse- 
quently, the sample was loaded onto a size exclusion column (Superdex 75; 
Amersham) equilibrated with 50 mM sodium phosphate buffer (pH 6.5) con- 
taining 150 mM NaCl and eluted with the same buffer. The fractions containing 
nsp3a(1-112) were again pooled and concentrated to a final volume of 550 pl, for 
a final protein concentration of 1.8 mM. 

Production of nucleic acid-free protein for NMR spectroscopy. nsp3a prepared 
as described in the preceding section copurifies with nucleic acids, as was readily 
observed in the 1-D ‘H NMR spectrum (see Fig. 8a). Nucleic acid-free samples 
were obtained by the following modification of the purification procedure. After 
the anion-exchange chromatography, the sample was kept at 25°C for 18 h. The 
protein solution was subsequently loaded onto a size exclusion column (Superdex 
75; Amersham) equilibrated with 50 mM sodium phosphate buffer (pH 6.5) 
containing 150 mM NaCl and eluted with the same buffer. Under these condi- 
tions, the protein and the nucleic acid eluted separately. The fractions containing 
nucleic acid-free nsp3a(1-112) were again pooled and concentrated to a final 
volume of 550 wl, for a final protein concentration of 1 to 2 mM. The 1-D 'H 
NMR spectrum of the sample used for the NMR structure determination (see 
Fig. 8b) confirms the absence of nucleic acids. 

NMR spectroscopy. NMR measurements were performed at 298 K with 
Bruker Avance 600, DRX 700, and Avance 800 spectrometers (Bruker BioSpin, 
Billerica, MA), equipped with TXI-HCN-z- or TXI-HCN-xyz gradient probe 
heads. Proton chemical shifts were referenced to internal 3-(trimethylsilyl)-1- 
propanesulfonic acid sodium salt (DSS). The '°C and '°N chemical shifts were 
referenced indirectly to DSS, using the absolute frequency ratios (42). The 
following NMR spectra were used to obtain sequence-specific backbone and 
side chain resonance assignments: 2-D [!°N,'H]-heteronuclear single-quan- 
tum coherence (HSQC), 2-D ['°C,"H]-HSQC, 3-D HNCA, 3-D HNCACB, 3-D 
CBCA(CO)NH, 3-D HNCO, 3-D HC(C)H-total correlation spectroscopy, 3-D 
'SN-resolved ['H,'H]-total correlation spectroscopy, and 2-D ['H,'H]-nuclear 
Overhauser effect spectroscopy (NOESY). 

Steady-state '"N{‘H}-NOEs were measured using transverse relaxation opti- 
mized spectroscopy-based experiments (32, 46) on a Bruker Avance 600 spec- 
trometer with a saturation period of 3.0 s and a total interscan delay of 5.0 s. 
Diffusion experiments were recorded on a Bruker DRX700 spectrometer using 
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TABLE 1. Input for the structure calculation and characterization 
of the bundle of 20 energy-minimized CYANA conformers that 
represent the NMR structure of nsp3a(1-112) 


Parameter Value 

Total no. of NOE upper distance limits 1,888 

Intraresidue 400 

Short range 637 

Medium range 491 

Long range 360 
No. of dihedral angle constraints 118 
Residual target function value (A’) 1.88 + 0.28 
Residual no. of NOE violations 

=01A | 22+4 

Maximum (A) 0.13 + 0.01 


Residual no. of dihedral angle violations 
25° 131 
Maximum (°) 2.44 + 0.82 
Amber energies (kcal/mol) 
Total =3102.71 = 80.00 
van der Waals —254.89 + 15.32 
Electrostatic =36/9.82.+ 82.77 
RMSD from ideal geometry 


Bond lengths (A) 0.0078 + 0.0002 


Bond angles (°) , 2.086 + 0.029 
RMSD to the mean coordinates (A)? 

Backbone heavy atoms (20-108) 0.77 + 0.09 

All heavy atoms (20-108) 1.02 = 0.10 
Ramachandran plot statistics‘ 

Most favored regions (%) 73 

Additional allowed regions (%) 24 

Generously allowed regions (%) 3 

Disallowed regions (%) 0 


“ Except for the six top entries, the average value for the 20 energy-minimized 
conformers with the lowest residual CYANA target function values and the 
standard deviation among them are given. 

> The backbone atoms are N, Ca, C’. The numbers in parentheses identify the 
residues for which the RMSD was calculated. 

“ As determined by PROCHECK (20). 


a longitudinal eddy current delay pulse scheme (1), with a diffusion time of 50 ms 
and sine-shaped gradients of 4.5 ms. The data were processed with TopSpin 
software (Bruker BioSpin, Billerica, MA). 

The interaction of nsp3a(1-112) with single-stranded RNA (ssRNA) was eval- 
uated by comparison of the 2-D ['°N,'"H]-HSQC spectra of nsp3a(1-112) re- 
corded at four nsp3a(1-112):ssRNA2 ratios, ie., 16:1, 8:1, 4:1, and 2:1. As 
controls, 2-D ['°N,‘H]-HSQC spectra were obtained after addition of eight units 
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of uridine (Octa-U) and Octa-A in fourfold excess with respect to the protein 
concentration, using otherwise identical conditions. The weighted average of the 
'H and !°N chemical shift differences, AS,,, was calculated as follows: A8.,, = 
{0.5[A8(1HN)? + (0.2A8(*°N))7]} 27 (28). 

Structure determination. The structure calculation was based on a 3-D '°N- 
resolved ['H,'H]-NOESY spectrum and on two 3-D !°C-resolved ['H,'H]- 
NOESY spectra recorded with the carrier frequency in the aliphatic and the 
aromatic regions, respectively. All three data sets were recorded with mixing 
times of 60 ms. In the input for the stand-alone version of the software package 
ATNOS/CANDID (9, 10), these NOE data were supplemented with the amino 
acid sequence and the chemical shift lists from the independently obtained 
sequence-specific resonance assignment (36). Seven cycles of automated 
NOESY peak picking and NOE cross-peak identification with ATNOS (9), 
automated NOE assignment with CANDID (10), and structure calculation with 
the torsion angle dynamics algorithm of CYANA (8) were performed. In the 
second and subsequent cycles, the intermediate protein structure was used as an 
additional guide for the interpretation of the NOESY spectra. During the first six 
cycles, ATNOS/CANDID/CYANA uses ambiguous distance restraints. In the 
final cycle, only distance restraints which could be attributed to a single pair of 
hydrogen atoms were retained. The 20 conformers with the lowest residual 
CYANA target function values obtained from the seventh ATNOS/CANDID/ 
CYANA cycle were energy minimized in a water shell with the program OPALp 
(18, 21), using the AMBER force field (5). The program MOLMOL (19) was 
used to analyze the ensemble of 20 energy-minimized conformers. 

Structure validation and data deposition. Analysis of the stereochemical qual- 
ity of the molecular models was accomplished using the Joint Center for Struc- 
tural Genomics Validation Central Suite (http:/Awww.jcsg.org), which integrates 
seven validation tools: Procheck, SFcheck, Prove, ERRAT, WASP, DDQ, and 
Whatcheck. 

Protein stoichiometry determination. Perfluoro-octanoic acid-polyacrylamide 
gel electrophoresis (PFO-PAGE) was performed according to the method of 
Ramjeesingh et al. (30). Purified protein samples were mixed 1:1 with PFO 
loading buffer containing 8% (wt/vol) PFO, 100 mM Tris, 20% (vol/vol) glycerol, 
and 0.05% (wt/vol) orange G. Samples with protein concentrations of 250 1M, 
500 wM, and 1 mM were loaded onto precast 4 to 20% Tris-glycine gels, and 
electrophoresis was performed with a standard Tris-glycine running buffer (In- 
vitrogen) to which 0.5% (wt/vol) PFO was added. Protein was detected by 
SYPRO-ruby poststain (Invitrogen). 

Electrophoretic mobility shift assay (EMSA). Protein samples (twofold dilu- 
tions from 128 wM to 1 ~M) were mixed with 0.8 pg of RNA substrate in 20 pl 
of assay buffer containing 150 mM NaCl, 50 mM Tris (pH 8.0), and 5 mM CaCl,. 
The RNA sequences used included ssRNA1, AAAUACCUCUCAAAAAUAA 
CACCACACCAUAUACCACAU, and ssRNA2, GGGGAUAAAA. Samples 
were incubated at 37°C for 1 h and analyzed by native electrophoresis on precast 
6% acrylamide DNA retardation gels (Invitrogen). RNA was detected by SYBR- 
gold poststain and photographed using a UV light source equipped with a digital 
camera. Protein was then detected by SYPRO-ruby poststain. Densitometric 
analysis was performed using a flatbed scanner with ImageJ software (NIH). The 
mobility shift of RNA at each protein concentration was calculated relative to the 
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FIG. 3. Electrostatic surface potential of nsp3a(1—-112). Positive and negative electrostatic potential is represented in blue and red, respectively. 
On the left we show the surface of helices «2, a3, and 3,, and of the loop between strands 83 and B4, which contain a high density of acidic residues 
(Fig. 1). On the right are shown the surface of helix a1 and strands 81, 82, and 84, which contain mainly neutral and basic residues. Positions of 


selected charged residues are indicated. 
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FIG. 4. Superposition of nsp3a(1-112) (green, regular secondary 
structures that superimpose with nsp3d; yellow, segments not present 
in nsp3d; gray, other segments) and the ubiquitin-like domain of nsp3d 
(31) (PDB code 2FE8) (red, regular secondary structures that super- 
impose with nsp3a; gray, other segments). The structure superposition 
was performed using the SSM module of Coot (7). Thirty Ca atoms 
were superimposed with a RMSD value of 2.22 A, i.e., from nsp3a(1- 
112) residues 20 to 26, 40 to 46, 49 to 54, 87 to 91, and 100 to 104 and 
from nsp3d residues 725 to 731, 739 to 745, 748 to 753, 754 to 758, and 
T7310 777, 


maximum shift observed in each experiment. K, (dissociation constant) values 
were determined from the midpoints of the fitted titration data (37). 

Nuclease susceptibility assay. nsp3a(1-183) and nsp3a(1-112) were incubated 
with several different nucleases in order to characterize nucleic acids that copu- 
rified with both proteins. RNase-free DNase I (NEB), T7 endonuclease I (NEB), 
RNase I; (NEB), RNase A (Invitrogen), and RNase T, (Ambion) cleavage assays 
were thus performed at 37°C for 1 h with the manufacturer’s recommended 
buffer conditions. Digested samples were analyzed by native electrophoresis on 
precast 6% acrylamide DNA retardation gels (Invitrogen). Nucleic acid was 
detected by SYBR-gold poststain. 

Protein structure accession numbers. The 'H, !°C, and '°N chemical shifts 
have been deposited in the BioMagResBank (http://www.bmrb.wisc.edu) under 
accession number 7029 (36). The atomic coordinates of the bundle of 20 con- 
formers used to represent the solution structure of nsp3a(1-112) and of the 
conformer closest to the mean coordinates of the ensemble have been deposited 
in the Protein Data Bank (PDB; http://www.rcsb.org/pdb/) under accession num- 
bers 2GRI and 2IDY, respectively. 


4 14 24 34 44 34 64 75 85 95 105 
amino acid sequence 


FIG. 5. '"N{*H}-NOE values plotted as relative intensities (J,.:), 
versus the sequence of nsp3a(1—-112). Diamonds represent experimen- 
tal measurements, which are linked by straight lines along the se- 
quence. Gaps represent proline residues, which lack a backbone 'H 
atom, or overlapping residues in the '°N-'H correlation spectrum that 
could not be integrated accurately. The experiment was recorded at a 
'H frequency of 600 MHz, using a saturation period of 3.0 s and a total 
interscan delay of 5.0 s. 
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FIG. 6. (a) Superposition of the 2-D ['°N,'’H]-HSQC spectra of 
nsp3a(1-183) (blue) and nsp3a(1-112) (red). (b) High-contour-level 
presentation of a 2-D ['°N,'H]-HSQC spectrum of nsp3a(1-183). (c) 
Heteronuclear NOE experiment with nsp3a(1-183), using a saturation 
period of 3.0 s and an interscan delay of 5.0 s. Negative and positive 
peaks are shown in pink and green, respectively. 
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RESULTS 


nsp3a structure determination. The NOE cross-peaks that 
were unambiguously assigned in the seventh cycle of the 
ATNOS/CANDID/CYANA calculation (see Materials and 
Methods for details) yielded 1,888 meaningful upper distance 
limits, which were used as input for the final structure calcu- 
lation with the program CYANA. The residual CYANA target 
function value of 1.88 + 0.28 A? and the average global root- 
mean-square deviation (RMSD) value relative to the mean 


AYVHYEIT SYIHSdWVH MAN AO AINN Aq GLOg ‘LE Yue] Uo /Bio'wse IAl//:;dyy Wo. papeojumMog 


VOL. 81, 2007 


0 350 700 G?[Glemy 


Nsp3a(1-112) 


200 
160 


10 | 


FIG. 7. Study of the oligomeric state of nsp3a(1-112). (a) Data 
obtained from NMR diffusion experiments at 700 MHz. The relative 
NMR signal intensity (In J/J,) is plotted versus the square of the 
gradient field strength, G*. ©, nsp3a(1-112); m, ribonuclease A; A, 
chymotrypsinogen. (b) PFO-PAGE of nsp3a(1-112); the sizes of the 
protein complexes were estimated from the benchmark protein ladder 
shown on the left (Invitrogen). The protein concentration increases 
from right to left in three steps of 250 w~M, 500 wM, and 1 mM. The 
filled arrowheads indicate the positions of the monomeric (12.6 kDa) 
and dimeric (25.2 kDa) forms of nsp3a(1-112). 


coordinates of 0.77 + 0.09 A calculated for the backbone 
atoms of residues 20 to 108 in the bundle of Fig. 2a (Table 1) 
represent a high-quality NMR structure determination. 
Solution structure of nsp3a. nsp3a(1—112) exhibits a ubiq- 
uitin-like fold with four helices and four B-strands arranged in 
the sequential order B1-a1-82-a2-3,5-a3-B3-B4 (Fig. 1 and 2). 
The long helix a2 and the presence of the a1- and 3,,-helices, 
which have not been observed in other ubiquitin-like proteins, 
make the overall structure more elongated than other ubiq- 
uitin-related folds. The strand B1 spans residues 20 to 24 and 
is connected via a well-defined nine-amino-acid linker to the 
helix a1 containing residues 34 to 37. A short turn then leads 
to B2 with residues 42 to 46. The helix a2 with residues 52 to 
66 is followed by a short loop that leads to the 3,,-helix of 
residues 70 to 75, which is further connected by a short turn 
with the helix a3 of residues 79 to 84. The last two regular 
secondary structures, B3 with residues 89 to 91 and B4 with 
residues 101 to 106, form an antiparallel B-sheet, and they are 
connected to each other by a tight turn followed by an ex- 
tended chain segment. The electrostatic potential surface of 
nsp3a(1-112) shows a pronounced polarity (Fig. 3), with the 
helices a2, a3, and 3,, exhibiting mainly negative charges to 
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FIG. 8. (a) 1-D 'H NMR spectrum of nsp3a(1-112) before removal 
of copurifying nucleic acids. Spectra were measured at 25 °C with water 
presaturation on a Bruker DRX700 spectrometer. Sixty-four scans 
were accumulated. The presence of characteristic nucleic acid signals 
in the area from 4.8 to 6.4 ppm (*) is readily apparent (1'H, 2’H, 3’H, 
4'H, 5’H, 5”H of all nucleotides and pyrimidine 5H are typically ob- 
served in this spectral region). (b) 1-D 'H NMR spectrum of the 
nucleic acid-free nsp3a(1-112) sample used for the NMR structure 
determination (see Materials and Methods). The weak peaks between 
4.8 and 6.4 ppm are part of the protein spectrum. (c) Isolation of RNA 
that copurified with nsp3a(1-112). The chromatogram was obtained 
after loading a sample of unfolded nsp3a(1-112) in 6 M guanidinium- 
HCI onto a size exclusion column. Absorbance at 280 nm and conduc- 
tivity are shown in blue and brown, respectively. The protein and 
ssRNA absorption peaks are labeled; the high conductivity observed 
after 320 minutes is due to guanidinium-HCl. 


the solvent while the strands B1 and 83 and helix a1 contain 
primarily positive or hydrophobic surface residues. 
nsp3a(1-112) is the second domain with a ubiquitin-like fold 
found within full-length nsp3. Previously, the N-terminal 70- 
amino acid segment of the fourth domain of nsp3, nsp3d (or 
PLpro), was found to have a ubiquitin-like fold (31). In Fig. 4, 
regular secondary structure elements in the segment 20 to 104 
of nsp3a have been superimposed with the corresponding 
polypeptide segments in the region of residues 725 to 777 of 
nsp3, which corresponds to the N-terminal domain of PLpro 
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FIG. 9. Mass spectrum of the isolated ssRNA fragment. The proposed structures for the main peaks are presented together with their 


corresponding molecular weights and atomic composition. 


(31). In as far as they overlap, the two structures share the 
same topology as canonical ubiquitin-like proteins, such as 
ISG15 (24) and Bacillus subtilis YukD (41). However, nsp3a 
also displays unique features (Fig. 4, yellow ribbon); i.e., the 
connection between strands B1 and B2 in nsp3a is longer than 
that in nsp3d and includes helix a1, nsp3a has two additional 
helices inserted between strands B2 and B3, and helix a2 is 
much longer in nsp3a than in nsp3d. 

Characterization of flexible regions in nsp3a. Mobility in the 
two nsp3a protein constructs was investigated by heteronuclear 
'N{'H}-NOE experiments (Fig. 5 and 6). Figure 5 shows the 
values of the steady-state ‘°N{'H}-NOE for each '°N-'H moi- 
ety in nsp3a(1-112). Residues 20 to 108, for which the mobility 
of the backbone '°N-'H moieties is essentially limited to the 
overall tumbling of the molecule, have positive NOE values of 
about 0.8. In contrast, residues 1 to 19 and 110 to 112 have 
values in the range of —0.4 to 0.5, indicating increased mobility 
for these polypeptide segments, which are also visibly less well 
defined in the structure (Fig. 2a). 

In order to investigate the structural role of the Glu-rich 
subdomain of residues 113 to 183, two nsp3a variants were 
generated which differ in the presence or absence of the C- 
terminal Glu-rich region, and the 2-D ['°N,'H]-HSQC spectra 
of the two proteins were then compared (Fig. 6a). There are no 
significant changes in the chemical shifts of the resonances of 
residues 2 to 112 in the two proteins, which indicates that both 
variants contain a similarly structured globular domain. These 


data also show for the full-length nsp3a (Fig. 6a, blue peaks) 
that most of the peaks from residues 113 to 183 are in the 
random coil chemical shift region ('H shifts between 7.5 to 8.5 
ppm). These chemical shifts and the high intensity of these 
resonances compared with the peaks from the globular region 
(Fig. 6a and b) are indicative of a flexibly extended polypeptide 
segment. This is confirmed by the fact that the '"N{*H}-NOE 
values for most of the peaks corresponding to residues 113 to 
183 are negative (Fig. 6c, pink peaks). Thus, the C-terminal 
Glu-rich subdomain is best described as a flexible tail of resi- 
dues 113 to 183 attached to the globular domain of residues 1 
to 112. 

nsp3a(1-112) is a monomer in solution. During the purifi- 
cation of nsp3a(1-112), we noticed that the retention volume 
of nsp3a by size exclusion chromatography (Superdex 75; 
Amersham) was lower than expected for a globular protein 
with a molecular mass of 12.6 kDa. In view of the implications 
for the structure determination and the biological activity of 
the protein, we decided to further investigate the oligomeric 
state of nsp3a(1-112) in solution using NMR diffusion exper- 
iments and PFO-PAGE. 

In diffusion NMR experiments, the decay of the signal in- 
tensity versus the square of the magnetic field gradient was 
used to estimate the translational diffusion properties of the 
proteins (40). In Fig. 7a we compare data obtained for 1 mM 
solutions of nsp3a(1-112), RNase A, and chymotrypsinogen, 
which have molecular masses of 12.6 kDa, 13.7 kDa, and 25.0 
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FIG. 10. Association of nsp3a(1-183) and nsp3a(1-112) purified 
from E. coli with nucleic acids. (a) Nucleic acid was visualized with 
SYBR-gold staining before or after digestion with nucleases specific to 
DNA (DNase I or T7 endonuclease) or RNA (RNase I, RNase A, or 
RNase T,). Cleavage assays were performed at 37°C for 1 h, and 
digested samples were analyzed by native electrophoresis on precast 
6% polyacrylamide gels. Open arrowheads denote copurified nucleic 
acid species associated with nsp3a(1-112) or nsp3a(1-183), respec- 
tively. (b) EMSAs were performed to estimate the RNA binding af- 
finity of nsp3a(1-112). Samples containing ssRNA1 or ssRNA2 were 
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kDa, respectively. The nsp3a(1-112) intensity decay curve is 
located between the two standards, which is indicative of the 
presence of the monomeric form, since the elongated shape of 
nsp3a(1-112) should result in a lower diffusion coefficient than 
near-spherical proteins with similar molecular masses. A PFO- 
PAGE gel also indicates that nsp3a(1-112) exists predomi- 
nantly in the monomeric form at room temperature. The as- 
says performed at the three different protein concentrations of 
1 mM, 500 pM, and 250 pM (Fig. 7b) show that even at a 1 
mM concentration the monomeric form predominates, and 
only a small amount of the dimeric form can be observed. 

nsp3a(1-112) binds ssRNA. In the initial nsp3a(1-112) pu- 
rification assays (see Materials and Methods), the protein co- 
purified with small fragments of ssRNA. These fragments were 
readily detected in the 1-D 'H NMR spectra (Fig. 8a) and were 
subsequently also observed by native PAGE analysis. In addi- 
tion to preparing the nucleic acid-free protein for the NUR 
structure determination (described in Materials and Methods), 
we also investigated the nature of the copurifying nucleic acids. 
To this end a sample of nucleic acid-loaded nsp3a(1-112) was 
unfolded in 6 M guanidinium-HCl solution, and the mixture 
was subsequently loaded onto a Superdex 75 size exclusion 
column (Fig. 8c). The mass spectrometry analysis of the iso- 
lated fragments allowed us to identify an RNA component 
with a molecular weight of 1327.3 (Fig. 9). The different peaks 
found in this spectrum are consistent with the sequences AU, 
GAU, and GAUA, with the longest component corresponding 
to GAUA. 

Nuclease digestion assays of protein samples containing co- 
purifying nucleic acids further revealed that the major species 
associated with nsp3a(1—183) was DNA, which could be com- 
pletely removed by DNase I treatment (Fig. 10a), and that the 
shorter form of the protein retained a much smaller nucleic 
acid species that was partly susceptible to RNase A digestion 
and was not susceptible to RNase I or T1 digestion (Fig. 10a). 
The incomplete digestion by RNase A and the lack of cleavage 
by RNase I or Tl were interpreted as an indication of the 
formation of a robust protein-RNA complex. 

We then went on to study the binding of exogenous ssRNA 
substrates to nsp3a(1-112), starting from the aforementioned 
observation that the endogenous RNA contained the pre- 
dominant trinucleotide sequence AUA. We thus designed 
two AUA-containing ssRNA fragments for further studies, 
ssRNA1 with the sequence AAAUACCUCUCAAAAAUAA 
CACCACACCAUAUACCACAU and ssRNA2 with the se- 
quence GGGGAUAAAA. The binding of nsp3a(1-112) to 


incubated at 37°C for 1 h with variable concentrations of protein and 
analyzed by native electrophoresis on precast 6% polyacrylamide gels. 
RNA was detected by SYPRO-gold poststain, and the fraction of 
bound RNA was calculated relative to the maximum binding observed 
in each experiment. Lane P, protein only; lanes 0, ssRNA only; lanes 
1 to 7 (left panel), ssRNA with twofold dilutions of protein from a final 
concentration of 128 wM to 2 uM for ssRNAI; lanes 2, 4, 6, and 8 
(right panel), ssRNA with fourfold dilutions of protein from 64 iM to 
1 pM for ssRNA2. Electrophoretic mobilities of free (f) and bound (b) 
forms of each ssRNA species are indicated with arrowheads. (c) 
ssRNA1-binding at variable concentrations of nsp3a(1-112), as calcu- 
lated from the EMSA data shown in panel b. 
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FIG. 11. EMSAs were performed to evaluate the affinity of nsp3a(1-112) for different nucleic acid species. (a) Gels obtained after loading 
mixtures of nsp3a(1-112) with 10 different ssDNA fragments (1 to 10). Lanes labeled P and M correspond to nucleic acid-free protein and nucleic 
acid marker, respectively. Comparison of the two gels, using nucleic acid-specific (left) and protein-specific (right) stains, indicates that nsp3a(1- 
112) does not exhibit affinity for ssDNAs. (b) Gels containing decreasing concentrations (100 to 1.6 ~M) of nsp3a(1-112), in the presence of 800 
ng of an ssRNA 40-mer lacking the sequence AUA (left), a double-stranded RNA 20-mer (center), and an ssDNA 40-mer (right). In lanes labeled 
N, only nucleic acid species were loaded. No interaction of nsp3a(1—112) and nucleic acids (NA) was observed under any of the above conditions. 
All experiments were performed after incubation of nsp3a(1-112) and the corresponding nucleic acid fragment for 1 h at 37 °C. 


these two ssRNAs was assessed by EMSA, using RNA-free 
protein prepared as described in Materials and Methods. The 
EMSA showed that nsp3a(1-112) bound to the two ssRNA 
substrates with similar affinity (Fig. 10b). Measurement of the 
percentage of bound ssRNAI at variable concentrations of 
nsp3a(1-112) (Fig. 10c) allowed us to estimate the dissociation 
constant of the nsp3a(1-112)-ssRNA1 complex to be approx- 
imately 20 wM. In control experiments, no binding was ob- 
served with several single-stranded DNA (ssDNA) sequences, 
or with double-stranded RNA sequences containing the motif 
AUA (Fig. 11). Furthermore, no binding to smaller ssRNA 
forms, such as fragments containing only G and U, and 
Octa-A, Octa-C, Octa-G, and Octa-U could be detected. Thus, 
RNA binding by nsp3a is consistent with the profile of a se- 
quence-sensitive ssRNA-binding protein. 

Following up on these results, NMR chemical shift pertur- 
bation studies were performed in order to map the regions of 
nsp3a(1-112) that are affected by the interaction with ssRNA2. 
Figure 12a and b show the effect of the addition of ssRNA2 on 
the chemical shift for each residue in nsp3a(1-112). The resi- 
dues with large chemical shift perturbations are all located on 
the same surface area of the protein. It is rather surprising that 
this contact region comprises the two loops linking B3 and £4, 
B1 and al, and the helices a1 and 3,,, which contain a surplus 
of negatively charged amino acid side chains (Fig. 3). There is 
thus an indication that these chemical shift perturbations 
might result primarily from long-range effects on the protein 
conformation rather than from direct protein-RNA contacts. 

nsp3a did not interact with other ssRNA species tested. For 
example, the superposition of the ['°N,'H]-HSQC spectra of 


nsp3a in the presence and absence of Octa-U (Fig. 12c) does 
not show any significant chemical shift differences, indicating 
that Octa-U does not bind to the protein and supporting the 
idea that the interaction of nsp3a(1-112) with ssRNA is se- 
quence specific. 


DISCUSSION 


nsp3a is well conserved within different SARS-CoV se- 
quences but exhibits low sequence identity (<35%) to other 
CoV nsp3 proteins. The closest sequence homologies with the 
globular domain of nsp3a prevail for the replicase polyproteins 
of porcine hemagglutinating encephalomyelitis virus and mu- 
rine hepatitis virus (Fig. 1). For example, the sequences in 
strands B3 and B4 are well conserved among all group 2 CoVs, 
including SARS-CoV, while the region containing the 3,,- and 
a3-helices is less well conserved, and helix a3 actually appears 
to be absent in the groups 1 and 3 CoVs. Additionally, the 
regions corresponding to B1 and al in nsp3a exhibit a high 
number of conservative amino acid substitutions. It is worth 
mentioning that B1, a1, and B4 define the positively charged 
surface areas of nsp3a (Fig. 3, right-hand panel). The al1- and 
3,0-helices, which have not been observed in other ubiquitin- 
like proteins, seem to be important for the interaction of nsp3a 
with ssRNAs, since they exhibit extensive chemical shift per- 
turbations upon ssRNA interaction and since other ubiquitin 
homologues do not exhibit RNA binding activity. 

Although the observed affinity of nsp3a for ssRNA cannot by 
itself define a unique biological function, it seems to be im- 
portant for the overall nsp3 biological role. As indicated above, 
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FIG. 12. (a) Superposition of the ['°N,'H]-HSQC spectra of 
nsp3a(1-112) in the absence (blue) and presence (red) of a fourfold 
excess of the exogenous ssRNA2 (see text). (b) Plot versus the amino 
acid sequence of the chemical shift changes in the backbone ‘H‘-'°N 
moieties of nsp3a(1-112) due to ssRNA2 binding. A6,,, is a weighted 
average of the 'H and ‘°N chemical shift differences determined from 
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nsp3 is a large multidomain protein, and only two of its do- 
mains, nsp3b and nsp3d, have been structurally and function- 
ally characterized to date. The analysis of these domains indi- 
cates that nsp3 is a multifunctional protein involved in multiple 
biological processes, such as proteolysis (31) and RNA pro- 
cessing (34). The fact that the presently studied N-terminal 
region of nsp3 and two of its other domains, nsp3c and nsp3e, 
exhibit RNA binding activity (B. W. Neuman et al. unpub- 
lished data) together with the ADP-ribose-1"-phosphate de- 
phosphorylation activity of nsp3b (34) suggests that this pro- 
tein could also be involved in the replication and processing of 
viral RNA. Although the short sequences AUA and GAUA 
are common in the genome, a possible biological function for 
the sequence-specific RNA-binding activity observed for nsp3a 
might be in binding to the 5’ end of the SARS-CoV genome. 
The sequence AUA occurs several times in the 5’ untranslated 
region (UTR) of the genome, including at the extreme 5’ end. 
Proteins that specifically recognize the 5’ UTR might function 
in cap-dependent translation or, alternatively, in genome rep- 
lication or subgenomic RNA synthesis. 

The observation of two ubiquitin-like structures within nsp3 
(nsp3a and the N-terminal domain of nsp3d) has important 
implications in attempting to assign its likely biological func- 
tion. In addition to being a cysteine protease, nsp3d is also a 
potent deubiquitinating enzyme that has been extensively stud- 
ied (2, 3, 31). It has been speculated earlier that the ubiquitin- 
like domain of SARS-CoV nsp3d might act as a decoy for 
cellular ubiquitinating enzymes, thereby protecting nascently 
synthesized viral proteins from proteasome-mediated degrada- 
tion. Alternatively, the two ubiquitin-like domains might be 
involved in modulation of protein-protein interaction pathways 
of cellular immunomodulators, such as interferons and 
ISGylating enzymes. This view is reinforced by the structural 
similarity of the two ubiquitin-like domains of nsp3 with 
ISG15, an interferon-stimulated gene that is induced as a pri- 
mary response to diverse stimuli, including viral infections. The 
SARS-CoV proteins 3b and 6 and the nucleocapsid protein 
have recently been shown to function as effective interferon 
antagonists (16). 

It seems possible that other SARS-CoV proteins, such as 
nsp1 (13) (and possibly host proteins as well) might also be 
part of these pathways, acting at either the RNA or protein 
levels. Several studies probing the intricate interplay of viral 
and host proteins during the progression of the SARS-CoV 
viral cycle have been reported (22, 33, 39). Since the biological 
role of nsp3a still remains unclear, structural homology studies 
could at this point provide insights into the potential function 
of this domain and its role within the viral cycle. 

nsp3a exhibits 3-D structural similarity with Ras-interact- 
ing domains. Many of the structural homologues of nsp3a 
interact with other polypeptides to regulate processes such as 
protein degradation, cell signaling (12), and antiviral response 
(24). It seems significant that five of them are Ras-interacting 


comparison of the ['°N,'H]-HSQC spectra shown in panel a: A3.,, = 
{0.5[A8(HN)* + (0.2A8(7°N))*]}". (c) Superposition of the ['°N,'H]- 
HSQC spectra of nsp3a(1-112) in the absence (blue) and presence 
(red) of a fourfold excess of Octa-U. 
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positions of the conserved residues corresponding to R23 in nsp3a(1-112) are indicated. (d) Ribbon presentations of the same structures as in 


panel c. 


proteins. Based on the primary sequences, the ubiquitin a/B- 
roll superfold comprises five families (14). Members of three of 
these families, RA (RalGDS/AF6 Ras-association domain), 
RBD (Raf-like Ras-binding domain) and PI3K_rbd (Ras-bind- 
ing domain of phosphatidylinositol 3-kinase-like proteins) in- 
teract with Ras (14). A large fraction of the structural homo- 
logues of nsp3a(1—112) identified using the software DALI are 
members of these families. The protein nsp3a(1-112) has the 
highest structural similarity with the Ras-interacting domain 
(RID) of RalGDS, a member of the RA family with which it 
shares the topology of the ubiquitin-like fold (Fig. 13d). This 
effector of Ras is a stimulator of the guanine nucleotide dis- 
sociation mechanism specific for Ral. RID-RalGDS binds Ras 
through its C-terminal domain and presents low sequence 
identity with other Ras-interacting proteins but similar hydro- 


phobic profiles (12). The superposition of the 3-D structures of 
RID-RalGDS and nsp3a(1-112) reveals a region with con- 
served residues located in strand 61 of nsp3a(1-112) (Fig. 13a 
and b) that is intimately involved in the Ras contact interface. 
Similarly, the Ras-binding domain of the AF6 protein (29), 
which is also a member of the RA family, shows 3-D structural 
homology with nsp3a(1-112) (Fig. 13d) and similar residues 
located in the B1 region (Fig. 13b). Both RalGDS and AF6 are 
known as Ras effectors. Similar patterns are also found in 
other RA domains with significant levels of structural homol- 
ogy with nsp3a(1-112), e.g., the human Grb7 protein and the 
guanine nucleotide exchange factor for Rap1 (25). 

In general, Ras domains contain a combination of hydro- 
phobic and acidic residues that interact with hydrophobic and 
positive groups on RIDs. Both nsp3a and the different, afore- 
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mentioned Ras-interacting proteins exhibit these characteris- 
tics (Fig. 13c). In nsp3a, the conserved basic residue R23 is 
located in strand B1, which exhibits a high degree of consensus 
with the RA family sequence (Fig. 13b). This suggests that 
nsp3a could interfere in biological processes that involve Ras. 
Given its high degree of similarity with RA family proteins, 
there might be a potential interaction of nsp3a with human Ras 
proteins during SARS-CoV infection. Ras family proteins act 
as molecular switches that cycle between inactive GDP- and 
active GTP-bound states. In this manner, Ras family proteins 
control cell growth, motility, intracellular transport, and differ- 
entiation. The fundamental role of Ras in the cell cycle pro- 
gression from phase Gy to G, has been extensively reported (6, 
27). Molecular interactions that result in Ras inactivation pre- 
vent cell progression to the G, phase. In this context, murine 
hepatitis virus is able to induce cell cycle arrest in the G)/G, 
phase during the lytic infection cycle (4). It has also been 
shown that some SARS-CoV proteins are able to induce ap- 
optosis or G,/G, arrest in transfected cells (17, 43, 44). Overall, 
the structural similarity of nsp3a(1-112) and RIDs thus leads 
us to hypothesize that nsp3a may have a physiological role in 
cell cycle arrest. 

Structure and potential functional role of the C-terminal 
Glu-rich subdomain. The Glu-rich C-terminal polypeptide 
segment 113 to 183 of nsp3a shows less than 25% sequence 
identity with the corresponding acidic regions in other CoV 
genomes, whereas the SARS-CoV protein contains overall a 
somewhat higher percentage of acidic residues than the acidic 
regions of other CoVs. Similar motifs are found in some eu- 
karyotic proteins (11, 23). In mammals, these acid-rich 
polypeptide segments are mainly involved in the transport of 
mRNA from the nucleus to the cytoplasm by association with 
RNA binding proteins. For example, the pp32/leucine-rich 
acidic protein associates with HuR, which binds to AU-rich 
elements of mRNAs to export mRNAs from the nucleus to the 
cytoplasm (11). Interestingly, Higashino et al. reported that 
several viruses interfere with this transport in order to increase 
the production of their virions by the cellular machinery (11). 

The NMR data of Fig. 5 and 6 now show that the Glu-rich 
region of nsp3a forms a flexible tail attached to the globular 
region of residues 1 to 112. Although sequence similarity to 
other proteins is not identifiable, several cellular proteins also 
contain regions with high percentages of acidic residues. The 
homopolymer of glutamic acid, poly-L-Glu, is unstructured at 
pH 8 but can adopt helical structures at pH 5 (15). Some acidic 
regions of polypeptides exhibit a well-defined regular second- 
ary structure when interacting with other proteins. The struc- 
ture of the RanGAP (35) complex with RanBP1 and RanGAP 
presents examples of both situations. Ran is a nuclear Ras- 
related protein that regulates both transport between nucleus 
and cytoplasm and the formation of the mitotic spindle or 
nuclear envelope in dividing cells (35). The C-terminal region 
of RanGAP, which is important for binding affinity, exhibits an 
acidic motif that is flexibly disordered in both the complexed 
and the uncomplexed forms of RanGAP, whereas other acid-rich 
segments of this protein comprise folded secondary structure el- 
ements. Given the high degree of similarity of nsp3a(1-112) with 
some of the other polypeptides involved in protein-protein inter- 
action processes, as well as its location in the large multido- 
main protein nsp3, the long, flexibly extended Glu-rich seg- 
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ment could have an important role in interactions with other 
SARS-CoV or host cell molecules, and this domain might 
adopt a well-defined fold during interactions with other 
polypeptides. Overall, the structural data reported in this pa- 
per indicate that the globular and nonglobular subdomains of 
nsp3a are important for SARS-CoV infection and persistence 
and thus represent new potential targets for therapeutic inter- 
vention. 
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